Antipattern: Specify Values in the Column Definition

Let's explore the insights we can derive from the antipattern 31 Flavors in this lesson.

Many people choose to specify the valid data values when they define a column. The column definition is part of the metadata, which is the definition of the table structure itself.

For example, we could define a CHECK constraint on the column. This constraint disallows any insert or update that would make the constraint false.

Creating Bugs table and defining a constraint on status column

MySQL supports a non-standard data type called ENUM that restricts the column to a specific set of values.

Creating Bugs table and using ENUM for restricting the data

In MySQL’s implementation, we declare the values as strings, but internally the column is stored as the ordinal number of the string in the enumerated list. The storage is therefore compact, but when we sort a query by this column, the result is ordered by the ordinal value, not alphabetically by the string value. This may be unexpected behavior for us.

Other solutions include domains and user-defined types (UDTs). We can use these to restrict a column to a specific set of values and conveniently apply the same domain or data type to several columns within our database. But these features are not supported widely among brands of RDBMSs yet.

Finally, we could write a trigger that contains the set of permitted values and causes an error to be raised unless the status matches one of these values.

Baskin-Robbins 31 Ice Cream

All of these solutions share some disadvantages. The following sections describe some of these problems.

What was the middle one?#

Let’s suppose that we’re developing a user interface so that a user can edit bug reports. In order to make the interface guide a user to pick one of the valid status values, we create a drop-down menu containing these values. How do we query the database for an enumerated list of values that are currently allowed in the status column?

Our first instinct may be to query all the values currently in use with a simple query like the following:

Retrieving the distinct status values

However, if all the bugs are new, the previous query returns only NEW. If we use this result to populate a user interface control with the statusof bugs, we could end up creating a chicken-and-egg situation; we can’t change a bug to any status other than those currently in use.

To get the complete list of permitted status values, we need to query the definition of that column’s metadata. Most SQL databases support system views for these kinds of queries, but using them can be complex. For example, if we used MySQL’s ENUM data type, we can use the following query to query the INFORMATION_SCHEMA system views:

Querying to retrieve the INFORMATION_SCHEMA system views

We can’t simply get the discrete enumeration values from the INFORMATION_SCHEMA in a conventional result set. Instead, we get a string containing the definition of the check constraint or ENUM data type. For example, the previous query in MySQL returns a column of type LONGTEXT, with the value ENUM(’NEW’, ’IN PROGRESS’, ’FIXED’), including the parentheses, commas, and single quotes. We must write application code to parse this string and extract the individual quoted values before we can use them to populate a user interface control.

The queries needed to report check constraints, domains, or UDTs are progressively more complex. Most people choose the arduous path of manually maintaining a parallel list of values in the application code. This is an easy way for bugs to affect our project as application data becomes out of sync with the database metadata.

Adding a new flavor#

The most common alterations are to add or remove one of the permitted values. There’s no syntax to add or remove a value from an ENUM or check constraint; we can only redefine the column with a new set of values. The following is an example of adding DUPLICATE as one new status value in the MySQL ENUM:

Modifying status column to add a value

The previous definition of the column allowed NEW, IN PROGRESS, and FIXED. This leads us back to the difficulty of querying the current set of values as described earlier.

Some database brands can’t change the definition of a column unless the table is empty. We may have to dump the contents of the table, redefine the table, and then import our saved data, making the table inaccessible in the meantime. This work is common enough that it has a name: ETL or “extract, transform, and load.” Other brands of database support restructuring a populated table with ALTER TABLE commands, but it can still be complex and expensive to perform these changes.

As a matter of policy, changing metadata — that is, changing the definition of tables and columns — should be infrequent and with attention to testing and quality assurance. If we need to change metadata to add or remove a value from an ENUM, then we either have to skip the appropriate testing or spend a lot of software engineering effort on a short notice to make the change. Either way, these changes introduce risk and destabilize your project.

Old flavors never die#

If we make a value obsolete, we could upset historical data. For example, let’s see what happens when we change our quality control process to replace FIXED with two stages, CODE COMPLETE and VERIFIED:

Modifying status column to add different data

If we remove FIXED from the enumeration, what do we do with bugs whose status was FIXED? Should we advance all FIXED bugs to VERIFIED? Should we instead set obsolete values to null or to a default value?

We may have to keep an obsolete value that old rows reference. But then, how can we distinguish between obsolete values and exclude them from our user interface so that no one can set a bug’s status to the obsolete value?

Portability is hard#

Check constraints, domains, and UDTs are not supported uniformly among brands of SQL databases. The ENUM data type is a proprietary feature in MySQL. Each brand of the database may have a different limit on the length of the list that we can give in a column definition. Trigger languages vary as well. These variations make it hard to choose a solution if we need to support multiple brands of databases.

Synopsis: 31 Flavors
Solution: Specify Values in Data
Mark as Completed
Report an Issue